Performance reproducibility index for classification

نویسندگان

  • Mohammadmahdi R. Yousefi
  • Edward R. Dougherty
چکیده

MOTIVATION A common practice in biomarker discovery is to decide whether a large laboratory experiment should be carried out based on the results of a preliminary study on a small set of specimens. Consideration of the efficacy of this approach motivates the introduction of a probabilistic measure, for whether a classifier showing promising results in a small-sample preliminary study will perform similarly on a large independent sample. Given the error estimate from the preliminary study, if the probability of reproducible error is low, then there is really no purpose in substantially allocating more resources to a large follow-on study. Indeed, if the probability of the preliminary study providing likely reproducible results is small, then why even perform the preliminary study? RESULTS This article introduces a reproducibility index for classification, measuring the probability that a sufficiently small error estimate on a small sample will motivate a large follow-on study. We provide a simulation study based on synthetic distribution models that possess known intrinsic classification difficulties and emulate real-world scenarios. We also set up similar simulations on four real datasets to show the consistency of results. The reproducibility indices for different distributional models, real datasets and classification schemes are empirically calculated. The effects of reporting and multiple-rule biases on the reproducibility index are also analyzed. AVAILABILITY We have implemented in C code the synthetic data distribution model, classification rules, feature selection routine and error estimation methods. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi12a/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Doppler-Derived Myocardial Performance Index in Healthy Children in Shiraz

Background: Assessment of myocardial function is essential in heart disease, but in regard to systolic and diastolic functions such evaluation has limitation. Ejection fraction is difficult to assess in abnormally-shaped ventricles, and diastolic inflow velocity pattern may be fused because of tachycardia. Objective: A myocardial performance index (MPI) or Tei index has been developed for adult...

متن کامل

An Integrated DEA and Data Mining Approach for Performance Assessment

This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...

متن کامل

Body Mass Index Classification based on Facial Features using Machine Learning Algorithms for utilizing in Telemedicine

Background and Objectives: Due to the impact of controlling BMI on life, BMI classification based on facial features can be used for developing Telemedicine systems and eliminating the limitations of measuring tools, especially for paralyzed people. So that physicians can help people online during the Covid-19 pandemic. Method: In this study, new features and some previous work features were e...

متن کامل

Investigating the Reliability and Reproducibility of the International Caries Detection and Assessment System Index in Evaluation of Dental Decay in 25- to 40-Year-Old People and Comparing it with the Decayed, Missing, Filled Index

Background The Decayed, Missing, Filled (DMF) index is one of the most common techniques for investigating dental decay. In recent years, the new International Caries Detection and Assessment System  (ICDAS) technique has been introduced to the world as a standard tool to confront the present challenges in diagnosing dental decay. The present study aimed to investigate the reliability and repr...

متن کامل

Increasing the accuracy of the classification of diabetic patients in terms of functional limitation using linear and nonlinear combinations of biomarkers: Ramp AUC method

The Area under the ROC Curve (AUC) is a common index for evaluating the ability of the biomarkers for classification. In practice, a single biomarker has limited classification ability, so to improve the classification performance, we are interested in combining biomarkers linearly and nonlinearly. In this study, while introducing various types of loss functions, the Ramp AUC method and some of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 28 21  شماره 

صفحات  -

تاریخ انتشار 2012